cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Ancil
by Contributor II
  • 1603 Views
  • 1 replies
  • 1 kudos

PythonException: 'RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was 1 and the length of input was 2.'.

I have pandas_udf, its working for 4 rows, but I tried with more than 4 rows getting below error.PythonException: 'RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was...

  • 1603 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ancil
Contributor II
  • 1 kudos

@Kaniz Fatma​  Can you please help me on pandas_udf ?Above scenario I have used regular expressions, for that we have our spark method, but I have other pandas_udf have same issue.

  • 1 kudos
tatekeller
by New Contributor
  • 1692 Views
  • 1 replies
  • 0 kudos

Can you access a repo file in an init script?

I'd like to configure a cluster with python libraries as defined in a requirements file. I have a pip requirements.txt file in a private repo which I have integrated on Databricks (and I can access it through the UI and view it on Databricks). I upda...

  • 1692 Views
  • 1 replies
  • 0 kudos
Latest Reply
sher
Valued Contributor II
  • 0 kudos

you can install in a cluster

  • 0 kudos
KVNARK
by Honored Contributor II
  • 865 Views
  • 1 replies
  • 5 kudos

accessing secret from spark cluster.

passing spark configuration to access blob, adls from data factory while creating job clusterit's working fine, but when in the property we are accessing secret it's not workingspark.hadoop.fs.azure.account.auth.type.{{secrets/scope/key}}.dfs.core.wi...

  • 865 Views
  • 1 replies
  • 5 kudos
Latest Reply
sher
Valued Contributor II
  • 5 kudos

check here : https://docs.databricks.com/security/secrets/secrets.html

  • 5 kudos
sonali1996
by New Contributor
  • 1213 Views
  • 2 replies
  • 0 kudos

adding Widget as a column and populating its value every-time in that column in a table.

hi , I want date for runtime from ADF as @utcnow() -- base paramater of notebook activity in ADF and take the data in ADB using widgets as runtime_date, further i want that column to be added in my table X with the populated value from the widget.Eve...

  • 1213 Views
  • 2 replies
  • 0 kudos
Latest Reply
sher
Valued Contributor II
  • 0 kudos

you can use as current_timestamp() or now()refer link: https://docs.databricks.com/sql/language-manual/functions/current_timestamp.html

  • 0 kudos
1 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 24910 Views
  • 6 replies
  • 7 kudos

Resolved! What does "Determining location of DBIO file fragments..." mean, and how do I speed it up?

Determining location of DBIO file fragments. This operation can take some time.What does this mean, and how do I prevent it from having to perform this apparently-expensive operation every time? This happens even when all the underlying tables are De...

  • 24910 Views
  • 6 replies
  • 7 kudos
Latest Reply
Christianben9
New Contributor II
  • 7 kudos

Determining location of DBIO file fragments" is a message that may be displayed during the boot process of a computer running the NetApp Data ONTAP operating system. This message indicates that the system is currently in the process of identifying an...

  • 7 kudos
5 More Replies
cgrant
by Databricks Employee
  • 3515 Views
  • 4 replies
  • 6 kudos

How do I know how much of a query/job used Photon?

I'm trying to use the native execution engine, Photon. How can I tell if a query is using Photon or is falling back to the non-native Spark engine?

  • 3515 Views
  • 4 replies
  • 6 kudos
Latest Reply
venkat09
New Contributor III
  • 6 kudos

Typo error in my second point of the previous post. Click the execution plan of your task[this is available under SQL/Dataframe tab in Spark UI]. It explains what operations run in the photon engine and what didn't execute by photon.

  • 6 kudos
3 More Replies
patdev
by New Contributor III
  • 5972 Views
  • 9 replies
  • 2 kudos

text datatype not supported and data having huge data in text filed how to bring it over

Hello all,I have medical field data file and one of the field is the text field with huge data not the big problem is databrick does not support text data type so how can i bring the data over. i tried conversion, cast in various way but so far not ...

  • 5972 Views
  • 9 replies
  • 2 kudos
Latest Reply
patdev
New Contributor III
  • 2 kudos

Setting escapeQuotes to false has helped to bring huge text data in colomn.thanks

  • 2 kudos
8 More Replies
Gaurav_784295
by New Contributor III
  • 2426 Views
  • 2 replies
  • 0 kudos

pyspark.sql.utils.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets

pyspark.sql.utils.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/DatasetsGetting this error while writing can any one please tell how we can resolve it

  • 2426 Views
  • 2 replies
  • 0 kudos
Latest Reply
Gaurav_784295
New Contributor III
  • 0 kudos

I'm trying to run query on some table and then storing that result in some table .query = stream .writeStream .format("delta") .foreachBatch(batch_function) \ .option('checkpointLocation', self.checkpoint_loc) .trigger(processingTime...

  • 0 kudos
1 More Replies
ty2
by New Contributor II
  • 2290 Views
  • 3 replies
  • 1 kudos

Resolved! How to start my cluster

​I try to stop my_cluster from compute from admin role. BTW, using same account, I could not restart my_cluster. The information is as followings. How should I do?

20230121-my_cluster_not_start
  • 2290 Views
  • 3 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

rit seems this is community edition so in CE this feature is disabled , delete this one and create new cluster

  • 1 kudos
2 More Replies
Sujitha
by Databricks Employee
  • 816 Views
  • 1 replies
  • 2 kudos

Documentation Update January 13 - 19 Databricks documentation provides how-to guidance and reference information for data analysts, data scientists, a...

Documentation Update January 13 - 19Databricks documentation provides how-to guidance and reference information for data analysts, data scientists, and data engineers working in the Databricks Data Science & Engineering, Databricks Machine Learning, ...

  • 816 Views
  • 1 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

thanks for the details

  • 2 kudos
vk217
by Contributor
  • 1634 Views
  • 1 replies
  • 0 kudos

Resolved! Import course material to databricks

I signed up for the data engineering course and downloaded the course material.However I cannot access the link to import the course material into databricks. Below link gives me access denied.https://www.databricks.training/step-by-step/importing-co...

  • 1634 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

https://github.com/databricks-academy/data-engineering-with-databricks-english use this link and download this to your local and then import, it will work

  • 0 kudos
Chris_Konsur
by New Contributor III
  • 8256 Views
  • 1 replies
  • 0 kudos

Resolved! configuring the Databricks JobAPIs and I get Error 403 User not authorized.

 I’m configuring the Databricks JobAPIs and I get Error 403 User not authorized.I found out the issue is that I need to apply a rule and set API permissions for AzureDatabricksAzure Portal>Azure Databricks>Azure Databricks Service>Access control (IAM...

  • 8256 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

for the particular jobs the user who is trying to start the job he should have access permission or run permission for that jobs , please give required permission and it will work for sure

  • 0 kudos
asif5494
by New Contributor III
  • 2139 Views
  • 3 replies
  • 0 kudos

preAction in databricks while writing into Google Big Query Table?

I am writing into Google Big Query table using append mode. I need to delete current day data before writing new data. I just want to know if there is any preActions parameter can be used to first delete data before writing into table? Below is the s...

  • 2139 Views
  • 3 replies
  • 0 kudos
Latest Reply
Cami
Contributor III
  • 0 kudos

Can you use override mode instead append?

  • 0 kudos
2 More Replies
Neli
by New Contributor III
  • 4092 Views
  • 2 replies
  • 0 kudos

How to add Current date as one of the column in Databricks

I am trying to create new column "Ingest_date" in table which should contain current date. I am getting error "Current date cannot be used in a generated column". Can you please review and suggest alternative to get the current date in delta table.

image image
  • 4092 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

A generation expression can use any SQL functions in Spark that always return the same result when given the same argument valuesSource: https://docs.delta.io/latest/delta-batch.html#use-generated-columnsIt means that it's intended to not work.You ca...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels