cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

gpzz
by New Contributor III
  • 3133 Views
  • 2 replies
  • 1 kudos

MEMORY_ONLY not working

val doubledAmount = premiumCustomers.map(x=>(x._1, x._2*2)).persist(StorageLevel.MEMORY_ONLY) error: not found: value StorageLevel

  • 3133 Views
  • 2 replies
  • 1 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 1 kudos

Hi @Gaurav Poojary​ ,Can you please try the below as displayed in the image it is working for me without any issues.Happy Learning!!

  • 1 kudos
1 More Replies
bozhu
by Contributor
  • 3076 Views
  • 3 replies
  • 3 kudos

Set taskValues in DLT workbooks

Is "setting taskValues in DLT workbooks" supported?I tried setting a task value in a DLT workbook, but it does not seem supported, so downstream workbooks within the same workflows job cannot consume this task value.

  • 3076 Views
  • 3 replies
  • 3 kudos
Latest Reply
Lê_Ngọc_Lợi
New Contributor III
  • 3 kudos

I have the same issue, I also want to know databricks support taskValue between taskJob and DLT or not?

  • 3 kudos
2 More Replies
Vik1
by New Contributor II
  • 12554 Views
  • 3 replies
  • 5 kudos

Some very simple functions in Pandas on Spark are very slow

I have a pandas on spark dataframe with 8 million rows and 20 columns. It took 3.48 minutes to run df.shape and it takes. It also takes a long time to run df.head took 4.55 minutes . By contrast df.var1.value_counts().reset_index() took only 0.18 sec...

  • 12554 Views
  • 3 replies
  • 5 kudos
Latest Reply
PeterDowdy
New Contributor II
  • 5 kudos

The reason why this is slow is because pandas needs an index column to perform `shape` or `head`. If you don't provide one, pyspark pandas enumerates the entire dataframe to create a default one. For example, given columns A, B, and C in dataframe `d...

  • 5 kudos
2 More Replies
sunil_smile
by Contributor
  • 7748 Views
  • 2 replies
  • 1 kudos

Vnet peering settings is not enable in Azure databricks premium , even though its deployed inside my VNET?

Hi All,Vnet peering settings is not enabled in Azure databricks , even though its deployed inside my VNET?Here i not mentioned my vnet and subnet details , but filled this and created databricks (without private endpoint - allow public access)virtual...

image image image
  • 7748 Views
  • 2 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi, VNET peering is not supported or possible on VNET-injected workspaces. Please refer: https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-peering#requirements

  • 1 kudos
1 More Replies
patdev
by New Contributor III
  • 4286 Views
  • 2 replies
  • 2 kudos

load new data in delta table

Hello all,I want to know how to update new data in delta table from new csv file.here is the code that i have used to create delta table from csv file and loaded data. but i have go new updated file and trying to load new data but not able to any gui...

  • 4286 Views
  • 2 replies
  • 2 kudos
Latest Reply
patdev
New Contributor III
  • 2 kudos

Thank you, i tried that and it ended in error, the table created with delta are from csv which must have converted to parquet file and all the columns are varchar or string. so not if i want to entered new file it ends in incmopatibility error for da...

  • 2 kudos
1 More Replies
sunil_smile
by Contributor
  • 20027 Views
  • 8 replies
  • 10 kudos

Resolved! How i can add ADLS Gen2 - OAuth 2.0 as Cluster scope for my High concurrency Shared Cluster (without unity catalog)?

Hi All,Kindly help me , how i can add the ADLS gen2 OAuth 2.0 authentication to my high concurrency shared cluster. I want to scope this authentication to entire cluster not for particular notebook.Currently i have added them as spark configuration o...

image.png image
  • 20027 Views
  • 8 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 10 kudos

error is because of missing default settings (create new cluster and do not remove them),the warning is because secrets should be put in secret scope, and then you should reference secrets in settings

  • 10 kudos
7 More Replies
JesseS
by New Contributor II
  • 9492 Views
  • 2 replies
  • 1 kudos

Resolved! How to extract source data from on-premise databases into a data lake and load with AutoLoader?

Here is the situation I am working with. I am trying to extract source data using Databricks JDBC connector using SQL Server databases as my data source. I want to write those into a directory in my data lake as JSON files, then have AutoLoader ing...

  • 9492 Views
  • 2 replies
  • 1 kudos
Latest Reply
Aashita
Databricks Employee
  • 1 kudos

To add to @werners point, I would use ADF to load SQL server data into ADLS Gen 2 as json. Then Load these Raw Json files from your ADLS base location into a Delta table using Autoloader.Delta Live Tables can be used in this scenario.You can also reg...

  • 1 kudos
1 More Replies
databicky
by Contributor II
  • 1585 Views
  • 1 replies
  • 0 kudos

Resolved! How to create border for sme specific cells?

i tried some code to create border for excel sheet, for particular cell iam able to write but while i am trying with some set of cells means it is showing error.​

  • 1585 Views
  • 1 replies
  • 0 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 0 kudos

Hi @Mohammed sadamusean​ ,Can you please try similar to below code using loops, I have implemented a similar use case that might be useful, please let me know if you need further top = Side(border_style = 'thin',color = '00000000') bottom = Side(bor...

  • 0 kudos
sreedata
by New Contributor III
  • 6246 Views
  • 4 replies
  • 7 kudos

Resolved! Getting status of "If Condition" Activity into a variable

"If Condition" has lot of activities that can succeeded or fail. If any activity fails then whole "If Condition" fails. I have to get the status of the "If Condition" activity (pass or fail) so that i can use it for processing in the next notebook t...

  • 6246 Views
  • 4 replies
  • 7 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 7 kudos

In your ADF Pipeline activity, set two different pipeline activities from the If condition activity based on success or failure (the green and red arrows). Then inside each pipeline activity, you can add a set variable, get variable and your adb note...

  • 7 kudos
3 More Replies
Chanu
by New Contributor II
  • 3081 Views
  • 2 replies
  • 2 kudos

Databricks JAR task type functionality

Hi, I would like to understand Databricks JAR based workflow tasks. Can I interpret JAR based runs to be something like a spark-submit on a cluster? In the logs, I was expecting to see the spark-submit --class com.xyz --num-executors 4 etc., And, the...

  • 3081 Views
  • 2 replies
  • 2 kudos
Latest Reply
Chanu
New Contributor II
  • 2 kudos

Hi, I did try using the Workflows>Jobs>CreateTask>JarTaskType>UploadedMyJAR and Class and created JobCluster and tested this task. This JAR reads some tables as input, does some transformations and output as writing some other tables. I would like t...

  • 2 kudos
1 More Replies
pasiasty2077
by New Contributor
  • 8823 Views
  • 1 replies
  • 1 kudos

Partition filter is skipped when table is used in where condition, why?

Hi,maybe someone can help me i do want to run very narrow query SELECT * FROM my_table WHERE snapshot_date IN('2023-01-06', '2023-01-07')   -- part of the physical plan: -- Location: PreparedDeltaFileIndex [dbfs:/...] -- PartitionFilters: [cast(snaps...

  • 8823 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

No hints on partition pruning afaik.The reason the partitions were not pruned is because the second query generates a completely different plan.To be able to filter the partitions, a join first has to happen. And in this case it means the table has...

  • 1 kudos
sudhanshu1
by New Contributor III
  • 5392 Views
  • 4 replies
  • 2 kudos

Resolved! DLT workflow failing to read files from AWS S3

Hi All, I am trying to read streams directly from AWS S3. I set the instance profile , but when i run the workflow it fails with below error"No AWS Credentials provided by TemporaryAWSCredentialsProvider : shaded.databricks.org.apache.hadoop.fs.s3a.C...

  • 5392 Views
  • 4 replies
  • 2 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 2 kudos

Hi @SUDHANSHU RAJ​ is UC enabled on this workspace? What is the access mode set on the cluster? Is this coming from the metastore or directly when you read from S3? Is the S3 cross-account?

  • 2 kudos
3 More Replies
alxsbn
by Contributor
  • 4327 Views
  • 2 replies
  • 2 kudos

Resolved! Autloader on CSV file didn't infer well cell with JSON data

Hello ! I playing with autoloader schema inference on a big S3 repo with +300 tables and large CSV files. I'm looking at autoloader with great attention, as it can be a great time saver on our ingestion process (data comes from a transactional DB gen...

  • 4327 Views
  • 2 replies
  • 2 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 2 kudos

PySpark by default is using \ as an escape character. You can change it to "Doc: https://docs.databricks.com/ingestion/auto-loader/options.html#csv-options

  • 2 kudos
1 More Replies
Victhor
by New Contributor III
  • 11141 Views
  • 2 replies
  • 12 kudos
  • 11141 Views
  • 2 replies
  • 12 kudos
Latest Reply
chanshing
New Contributor III
  • 12 kudos

@Kaniz Fatma​ Is that tool (dbvim) still maintained? It looks like it has been abandoned and there are a couple of unresolved issues.Are there any plans to support vim keybindings in Databricks? This is possible in many other web-based editors such a...

  • 12 kudos
1 More Replies
DeveloperAmarde
by New Contributor
  • 3892 Views
  • 1 replies
  • 0 kudos

Connection to Collibra

Hi Team,I want to connect to collibra to fetch details from Collibra.Currently we are using username and password to connect.I want to know recommended practice to connect Collibra account from databricks notebook.

  • 3892 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, Could you please know if this helps. https://marketplace.collibra.com/listings/jdbc-driver-for-databricks/

  • 0 kudos
Labels