cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ms_221
by New Contributor II
  • 296 Views
  • 1 replies
  • 0 kudos

Need to load the data from databricks to Snowflake table having ID,which automatically increments

I want to load the data from  df (say 3 columns c1,c2,c3) into the snowflake table say (test1) having columns (c1,c2,c3) and ID autoincrement column.The df and snowflake table (test1) have same column definition and same datatypes. In the target tabl...

  • 296 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

To load data from a DataFrame into a Snowflake table with an autoincrement ID column, you can follow these steps: First, ensure that your Snowflake table (test1) is created with an autoincrement ID column:CREATE OR REPLACE TABLE test1 ( ID INT AU...

  • 0 kudos
JonathanFlint
by New Contributor II
  • 1629 Views
  • 7 replies
  • 0 kudos

Asset bundle doesn't sync files to workspace

I've created a completely fresh project with a completely empty workspaceLocally I have the databricks CLI version 0.230.0 installedI rundatabricks bundle init default-pythonI have auth set up with a PAT generated by an account which has workspace ad...

  • 1629 Views
  • 7 replies
  • 0 kudos
Latest Reply
Mathias_Peters
Contributor
  • 0 kudos

Hi, I had a similar problem today. I changed the way, that we deploy our main bundle using pull requests and in order to play around with it locally, I copied python and dbt code into the databricks src dir (that is normally done during a github work...

  • 0 kudos
6 More Replies
Guigui
by New Contributor II
  • 460 Views
  • 3 replies
  • 0 kudos

Job start time timezone

It is mentionned in the documentation that the job.start_time is a value of time in UTC timezone but I wonder if it's always the case because as the start_time is in UTC timezone for a scheduled job, it is in local timezone when it is manually trigge...

  • 460 Views
  • 3 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

To determine whether a Databricks job was triggered manually or by schedule, you can use the dynamic value reference {{job.trigger.type}}. T

  • 0 kudos
2 More Replies
RobDineen
by Contributor
  • 643 Views
  • 4 replies
  • 0 kudos

Resolved! Pyspark to_date not coping with single digit Day or Month

Hi there i have a simple Pyspark To_date function but fails due to days or months from 1-9 sois there a nice easy way to get round this at allRegardsRob

RobDineen_0-1731324661487.png
  • 643 Views
  • 4 replies
  • 0 kudos
Latest Reply
RobDineen
Contributor
  • 0 kudos

Resolved using format_string dff = df.withColumn("DayofMonthFormatted", when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), format_string("0%d", df.DayofMonth)).otherwise(df.DayofMonth))

  • 0 kudos
3 More Replies
Avinash_Narala
by Valued Contributor II
  • 814 Views
  • 2 replies
  • 2 kudos

Fully serverless databricks SaaS

I'm exploring Databricks' fully serverless SaaS option, as shown in the attached image, which promises quick setup and $400 in initial credits. I'm curious about the pros and cons of using this fully serverless setup.Specifically, would this option b...

  • 814 Views
  • 2 replies
  • 2 kudos
Latest Reply
gchandra
Databricks Employee
  • 2 kudos

There are; if you have spark config, customer jars, and init scripts, they won't work. Please check this page for long list of limitations. https://docs.databricks.com/en/compute/serverless/limitations.html

  • 2 kudos
1 More Replies
rcostanza
by New Contributor III
  • 615 Views
  • 4 replies
  • 2 kudos

Resolved! Changing git's author field when committing through Databricks

I have a git folder to a Bitbucket repo. Whenever I commit something, the commit uses my Bitbucket username (the unique name) in the field author, making it less readable when I'm reading a list of commits.For example, commits end up like this: commi...

  • 615 Views
  • 4 replies
  • 2 kudos
Latest Reply
yermulnik
New Contributor II
  • 2 kudos

Just found us suffering from the same issue since we enforced a GitHub ruleset to require commit emails to match our Org email pattern of `*@ourorgdomain.com`.

  • 2 kudos
3 More Replies
dfish8124
by New Contributor
  • 335 Views
  • 1 replies
  • 0 kudos

Streaming IoT Hub Data Using Delta Live Table Pipeline

Hello,I'm attempting to stream IoT Hub data using a delta live table pipeline.  The issue I'm running into is that eventhub streaming isn't supported on shared clusters ([UNSUPPORTED_STREAMING_SOURCE_PERMISSION_ENFORCED] Data source eventhubs is not ...

  • 335 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @dfish8124 ,Is is possible to share with us code?

  • 0 kudos
Subhasis
by New Contributor II
  • 316 Views
  • 2 replies
  • 0 kudos

Small json files issue . taking 2 hours to read 3000 files

Hello I am trying to read 3000 json files which has only one records. It is taking 2 hours to read all the files . How can I perform this operation faster pls suggest.

  • 316 Views
  • 2 replies
  • 0 kudos
Latest Reply
Subhasis
New Contributor II
  • 0 kudos

This is the code ---df1 = spark.read.format("json").options(inferSchema="true", multiLine="true").load(file1) 

  • 0 kudos
1 More Replies
MarkV
by New Contributor III
  • 287 Views
  • 2 replies
  • 0 kudos

DLT Runtime Values

When my pipeline runs, I have a need to query a table in the pipeline before I actually create another table. I need to know the target catalog and target schema for the query. I figured the notebook might run automatically in the context of the cata...

  • 287 Views
  • 2 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

can you set up notebook parameters and pass them in the DLT pipeline? https://docs.databricks.com/en/jobs/job-parameters.html

  • 0 kudos
1 More Replies
NemesisMF
by New Contributor II
  • 437 Views
  • 4 replies
  • 2 kudos

Obtain refresh mode from within Delta Live Table pipeline run

Is it possible to obtain somehow if a DLT pipeline run is running in Full Refresh or incremental mode from within a notebook running in the pipeline?I looked into the pipeline configuration variables but could not find anything.It would be benefitial...

  • 437 Views
  • 4 replies
  • 2 kudos
Latest Reply
NemesisMF
New Contributor II
  • 2 kudos

We found a solution where we do not need to determine the refresh mode anymore. But I still do not know how to get the current refresh mode of the current pipeline run from within a notebook that is running in the pipeline. This may would still be be...

  • 2 kudos
3 More Replies
SALP_STELLAR
by New Contributor
  • 565 Views
  • 1 replies
  • 0 kudos

AzureException: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: Server failed to a

Actually My first part of the code works fine:  dbutils.widgets.text("AutoLoanFilePath", "")inputPath = dbutils.widgets.get("AutoLoanFilePath")# inputPath = 'SEPT_2024/FAMILY_SECURITY'autoPath = 'dbfs:/mnt/dbs_adls_mnt/Prod_landing/'+inputPathautoLoa...

  • 565 Views
  • 1 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

This looks like an authentication issue when trying to access Azure Blob Storage from your Databricks environment. Can you please check the storage credentials and the setup?  Consider using an Azure AD service principal with the appropriate RBAC rol...

  • 0 kudos
L1000
by New Contributor III
  • 258 Views
  • 1 replies
  • 0 kudos

How to detect gap in filenames (Autoloader)

So my files arrive at the cloud storage and I have configured an autoloader to read these files.The files have a monotonically increasing id in their name.How can I detect a gap and stop the DLT as soon as there is a gap?eg.Autoloader finds file1, in...

  • 258 Views
  • 1 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

It doesn't seem like this can be done through the DLT autoloader. Particularly you require an automatic stop without manual intervention. You can write a custom Structured Streaming job and use a sequence-checking logic, and foreachBatch to process i...

  • 0 kudos
rvo19941
by New Contributor II
  • 279 Views
  • 1 replies
  • 0 kudos

Auto Loader with File Notification mode not picking up new files in Delta Live Tables pipeline

Dear,I am developing a Delta Live Table pipeline and use Auto Loader with File Notification mode to pick up files inside an Azure storage account (which is not the storage used by the default catalog). When I full refresh the target streaming table, ...

rvo19941_0-1730383733629.png
  • 279 Views
  • 1 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

Based on the error "Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key", the pipeline was still trying to use an account key authentication method instead of service principal au...

  • 0 kudos
dipali_globant
by New Contributor II
  • 432 Views
  • 2 replies
  • 0 kudos

parsing json string value column into dataframe structure

Hi All,I have to read kafka payload which has value column with json string. But the format of the json is as below.{ "data": [ { "p_al4": "N/A", "p_a5": "N/A", "p_ad": "OA003", "p_aName": "Abc", "p_aFlag": true ,....(dynamic)} ] }In data key it can ...

  • 432 Views
  • 2 replies
  • 0 kudos
Latest Reply
dipali_globant
New Contributor II
  • 0 kudos

No I don't know element in JSON . so I can't define structure.

  • 0 kudos
1 More Replies
mickniz
by Contributor
  • 26794 Views
  • 8 replies
  • 18 kudos

cannot import name 'sql' from 'databricks'

I am working on Databricks version 10.4 premium cluster and while importing sql from databricks module I am getting below error. cannot import name 'sql' from 'databricks' (/databricks/python/lib/python3.8/site-packages/databricks/__init__.py).Trying...

  • 26794 Views
  • 8 replies
  • 18 kudos
Latest Reply
ameet9257
Contributor
  • 18 kudos

if you ever received this kind of error after installing the correct Python package then try running the below command. dbutils.library.restartPython()

  • 18 kudos
7 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels