cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

isaac_gritz
by Valued Contributor II
  • 884 Views
  • 4 replies
  • 8 kudos

Databricks Runtime Support

How Long are Databricks runtimes supported for? How often are they updated?You can learn more about the Databricks runtime support lifecycle here (AWS | Azure | GCP).Long Term Support (LTS) runtimes are released every 6 months and supported for 2 yea...

  • 884 Views
  • 4 replies
  • 8 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor II
  • 8 kudos

Thanks for update

  • 8 kudos
3 More Replies
Saikrishna2
by New Contributor III
  • 2664 Views
  • 7 replies
  • 11 kudos

Data bricks SQL is allowing 10 queries only ?

•Power BI is a publisher that uses AD group authentication to publish result sets. Since the publisher's credentials are maintained, the same user can access the data bricks database.•Number of the users are retrieving the data from the power bi or i...

  • 2664 Views
  • 7 replies
  • 11 kudos
Latest Reply
VaibB
Contributor
  • 11 kudos

I believe 10 is a limit as of now. See if you can increase the concurrency limit from the source.

  • 11 kudos
6 More Replies
User16835756816
by Valued Contributor
  • 1891 Views
  • 4 replies
  • 11 kudos

How can I extract data from different sources and transform it into a fresh, reliable data pipeline?

Tip: These steps are built out for AWS accounts and workspaces that are using Delta Lake. If you would like to learn more watch this video and reach out to your Databricks sales representative for more information.Step 1: Create your own notebook or ...

  • 1891 Views
  • 4 replies
  • 11 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor II
  • 11 kudos

Thanks @Nithya Thangaraj​ 

  • 11 kudos
3 More Replies
him
by New Contributor III
  • 6697 Views
  • 8 replies
  • 5 kudos

i am getting the below error while making a GET request to job in databrick after successfully running it

"error_code": "INVALID_PARAMETER_VALUE",  "message": "Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."}

Capture
  • 6697 Views
  • 8 replies
  • 5 kudos
Latest Reply
SANKET
New Contributor II
  • 5 kudos

Use https://<databricks-instance>/api/2.1/jobs/runs/get?run_id=xxxx."get-output" gives the details of single run id which is associated with the task but not the Job.

  • 5 kudos
7 More Replies
yubin-apollo
by New Contributor II
  • 743 Views
  • 2 replies
  • 0 kudos

COPY INTO skipRows FORMAT_OPTIONS does not work

Based on the COPY INTO documentation, it seems I can use `skipRows` to skip the first `n` rows. I am trying to load a CSV file where I need to skip a few first rows in the file. I have tried various combinations, e.g. setting header parameter on or ...

  • 743 Views
  • 2 replies
  • 0 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 0 kudos

Hi @Yubin Park​ Can you write down the statement you are using to copy the data. Also, after copying, did you check the record count between source and target and find that records are not skipped ?

  • 0 kudos
1 More Replies
chhavibansal
by New Contributor II
  • 1194 Views
  • 4 replies
  • 1 kudos

ANALYZE TABLE showing NULLs for all statistics in Spark

var df2 = spark.read .format("csv") .option("sep", ",") .option("header", "true") .option("inferSchema", "true") .load("src/main/resources/datasets/titanic.csv")   df2.createOrReplaceTempView("titanic") spark.table("titanic").cach...

  • 1194 Views
  • 4 replies
  • 1 kudos
Latest Reply
chhavibansal
New Contributor II
  • 1 kudos

can you share what the *newtitanic* is I think that you would have done something similarspark.sql("create table newtitanic as select * from titanic")something like this works for me, but the issue is i first make a temp view then again create a tab...

  • 1 kudos
3 More Replies
Jain
by New Contributor III
  • 1078 Views
  • 1 replies
  • 0 kudos

How to install GDAL on Databricks Cluster ?

I am currently using Runtime 10.4 LTS.The options available on Maven Central does not work as well as on PyPi.I am running:try:   from osgeo import gdal   except ImportError:   import gdalto validate but it throws ModuleNotFoundError: No module n...

  • 1078 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

@Abhishek Jain​  I can understand your issue it happens to me also multiple times so solving this issue I used to install the init script in my clusterMajor reason is that your 10X version does not support your current library so you have to find rig...

  • 0 kudos
Slalom_Tobias
by New Contributor III
  • 4907 Views
  • 1 replies
  • 1 kudos

AttributeError: 'SparkSession' object has no attribute '_wrapped' when attempting CoNLL.readDataset()

I'm getting the error...AttributeError: 'SparkSession' object has no attribute '_wrapped'---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<command-2311820097584616> in <cell li...

  • 4907 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

this can happen in 10X version try to use 7.3 LTS and share your observationand if it not working there try to create init script and load it to your databricks cluster so whenever your machine go up you can get advantage of that library because some...

  • 1 kudos
rammy
by Contributor III
  • 660 Views
  • 1 replies
  • 5 kudos

Not able to parse .doc extension file using scala in databricks notebook?

I could able to parse .doc extension files using Java programming with the help of POI libraries but when trying to convert Java code into Scala i expect it has to work with same java libraries with Scala programming but it is showing with below erro...

error screenshot Jar dependencies
  • 660 Views
  • 1 replies
  • 5 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 5 kudos

Hi @Ramesh Bathini​ In pyspark, we have a docx module. I found that to be working perfectly fine. Can you try using that ?Documentation and stuff could be found online. Cheers...

  • 5 kudos
Snuki
by New Contributor II
  • 810 Views
  • 4 replies
  • 3 kudos
  • 810 Views
  • 4 replies
  • 3 kudos
Latest Reply
Harun
Honored Contributor
  • 3 kudos

I used to get these kind of error from databricks partner page, try to manually search the course that you are looking for. For example, when i used the link to navigate to datalakehouse foundational course page it showed the same error to me. i manu...

  • 3 kudos
3 More Replies
db-avengers2rul
by Contributor II
  • 4280 Views
  • 2 replies
  • 0 kudos

Resolved! delete files from the directory

Is there a way to delete recursively files using a command in notebookssince in the below directory i have many combination of files like .txt,,png,.jpg but i only want to delete files with .csv example dbfs:/FileStore/.csv*

  • 4280 Views
  • 2 replies
  • 0 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 0 kudos

Hi @Rakesh Reddy Gopidi​ You can use the os module to iterate over a directory.By using a loop over the directory, you can check what the file ends with using .endsWith(".csv).After fetching all the files, you can remove it. Hope this helps..Cheers.

  • 0 kudos
1 More Replies
UmaMahesh1
by Honored Contributor III
  • 2241 Views
  • 2 replies
  • 15 kudos

Resolved! Pyspark dataframe column comparison

I have a string column which is a concatenation of elements with a hyphen as follows. Let 3 values from that column looks like below, Row 1 - A-B-C-D-E-FRow 2 - A-B-G-C-D-E-FRow 3 - A-B-G-D-E-FI want to compare 2 consecutive rows and create a column ...

  • 2241 Views
  • 2 replies
  • 15 kudos
Latest Reply
NhatHoang
Valued Contributor II
  • 15 kudos

Hi,I think you can follow these steps:1. Use window function to create a new column by shifting, then your df will look like thisid value lag1 A-B-C-D-E-F null2 A-B-G-C-D-E-F A-B-C-D-E-F3 A-B-G-D-E-F ...

  • 15 kudos
1 More Replies
cozos
by New Contributor III
  • 1859 Views
  • 6 replies
  • 5 kudos

What does "ScalaDriverLocal: User Code Compile error" mean?

22/11/30 01:45:31 WARN ScalaDriverLocal: loadLibraries: Libraries failed to be installed: Set()   22/11/30 01:50:14 INFO Utils: resolved command to be run: WrappedArray(getconf, PAGESIZE) 22/11/30 01:50:15 WARN ScalaDriverLocal: User Code Compile err...

  • 1859 Views
  • 6 replies
  • 5 kudos
Latest Reply
cozos
New Contributor III
  • 5 kudos

Hi @Werner Stinckens​ thanks for the help. Unfortunately I don't think its so simple - I do have a JAR that I submitted as a Databricks JAR task, and the JAR does have the org.apache.beam class: I guess what I'm trying to understand is what does Scal...

  • 5 kudos
5 More Replies
vr
by Contributor
  • 3055 Views
  • 12 replies
  • 9 kudos

Why is execution too fast?

I have a table, full scan of which takes ~20 minutes on my cluster. The table has "Time" TIMESTAMP column and "day" DATE column. The latter is computed (manually) as "Time" truncated to day and used for partitioning.I query the table using predicate ...

stage stats DAG
  • 3055 Views
  • 12 replies
  • 9 kudos
Latest Reply
Kaniz
Community Manager
  • 9 kudos

Hi @Vladimir Ryabtsev​, We haven’t heard from you since the last response from ​​@Uma Maheswara Rao Desula​, and I was checking back to see if their suggestions helped you.Or else, If you have any solution, please share it with the community, as it c...

  • 9 kudos
11 More Replies