cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

LoiNguyen
by New Contributor II
  • 18095 Views
  • 5 replies
  • 2 kudos

The authentication type 10 is not supported

I use below code to connect to postgresql. df = spark.read \ .jdbc("jdbc:postgresql://hostname:5432/dbname", "schema.table", properties={"user": "user", "password": "password"})\ .load() df.printSchema() However, I got the ...

  • 18095 Views
  • 5 replies
  • 2 kudos
Latest Reply
simboss
New Contributor II
  • 2 kudos

But how are we going to do this for those who use Windows?

  • 2 kudos
4 More Replies
PassionateDBD
by New Contributor II
  • 1175 Views
  • 1 replies
  • 1 kudos

Is it possible to create/update non dlt table in init phase of dlt task?

We have a dlt task that is written in python. Is it possible to create or update a delta table programatically from inside a dlt task? The delta table would not be managed from inside the dlt task because we never want to fully refresh that table. Th...

  • 1175 Views
  • 1 replies
  • 1 kudos
Latest Reply
PassionateDBD
New Contributor II
  • 1 kudos

Thanks for you reply @Retired_mod ! I'm aware of the possibility to create or not create a table based on some parameter.What I'm trying to figure out is basically how to achieve following:-DLT pipeline starts and logs some information to a delta tab...

  • 1 kudos
Hertz
by New Contributor II
  • 1451 Views
  • 0 replies
  • 0 kudos

Structured Streaming Event in Audit Logs

I am trying to monitor when a table is created or updated using the audit logs. I have found that structured streaming writes/appends are not captured in the audit logs? Am I missing something shouldn't this be captured as a unity catalog event. Eith...

Data Engineering
Audit Logs
structured streaming
  • 1451 Views
  • 0 replies
  • 0 kudos
srinivas_001
by New Contributor III
  • 1223 Views
  • 1 replies
  • 1 kudos

File trigger options -- cloudFiles.allowOverwrites

I have a Job configured to run on the file arrival I have provided the path as File arrival path: s3://test_bucket/test_cat/test_schema/When a new parquet file arrived in this path the job was triggering automatically and processed the fileIn case of...

  • 1223 Views
  • 1 replies
  • 1 kudos
Latest Reply
srinivas_001
New Contributor III
  • 1 kudos

Hi Kaniz,Thank you for the response.I am using the databricks runtime 11.3, also checked the checkpoint and data source location which are properly configured. Still I am unable to trigger the job.NOTE: Incoming files are pushed to AWS s3 location fr...

  • 1 kudos
Nisha2
by New Contributor II
  • 1909 Views
  • 1 replies
  • 0 kudos

Databricks spark_jar_task failed when submitted via API

Hello,We are submitting jobs to the data bricks cluster using  /api/2.0/jobs/create this API and running a spark java application (jar that is submitted to this API). We are noticing Java application is executing as expected. however, we see that the...

Data Engineering
API
Databricks
spark
  • 1909 Views
  • 1 replies
  • 0 kudos
Pragati_17
by New Contributor II
  • 2610 Views
  • 0 replies
  • 0 kudos

Parameters Passing to dataset in Databricks Lakeview Dashboard

I have a date range filter in Lakeview Dashboard and i want to distinct count number of months in selected date range filter and divide it with one of the columns and that column is used in counter viualization. But passing parameters is not possible...

  • 2610 Views
  • 0 replies
  • 0 kudos
ElaPG
by New Contributor III
  • 2766 Views
  • 1 replies
  • 1 kudos

Cluster creation / unrestricted policy option

Hi,as an workspace admin I would like to disable cluster creation with "no isolation" access mode. I created a custom policy for that but I still have the option to create cluster with "unrestricted" policy. How can I make sure that nobody will creat...

  • 2766 Views
  • 1 replies
  • 1 kudos
Latest Reply
ElaPG
New Contributor III
  • 1 kudos

Hi,thank you for a very informative reply.To sum up, in order to enforce these suggestions:- first solution must be executed on an account level- second solution must be executed on a workspace level (workspace level admin settings)

  • 1 kudos
Gilg
by Contributor II
  • 1823 Views
  • 0 replies
  • 0 kudos

DLT Performance

Hi,Context:I have created a Delta Live Table pipeline in a UC enabled workspace that is set to Continuous.Within this pipeline,I have bronze which uses Autoloader and reads files stored in ADLS Gen2 storage account in a JSON file format. We received ...

  • 1823 Views
  • 0 replies
  • 0 kudos
William_Scardua
by Valued Contributor
  • 23393 Views
  • 2 replies
  • 0 kudos

How to estimate dataframe size in bytes ?

How guys,How do I estimate the size in bytes from my dataframe (pyspark) ?Have any ideia ?Thank you

  • 23393 Views
  • 2 replies
  • 0 kudos
Latest Reply
BroData
New Contributor II
  • 0 kudos

Hi @Dribka @William_Scardua import numpy actual_size_of_each_columns=df.toPandas().memory_usage(deep=True).to_dict() del actual_size_of_each_columns["Index"] for key in actual_size_of_each_columns: print(f"Size of the Column `{key}` -> {actual_si...

  • 0 kudos
1 More Replies
Abdul1
by New Contributor
  • 1480 Views
  • 1 replies
  • 0 kudos

How to output data from Databricks?

Hello,I am just starting with Databricks in Azure and I need to output the data to an Affinity CRM system.Affinity has an API and I am wondering is there any sort of automated / data pipeline sort of way to tell databricks to just pump the data into ...

  • 1480 Views
  • 1 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor III
  • 0 kudos

We need more info on what kind of data, volume and what the called APi can handle. Calling an API for single records in parallel can be achieved using UDF(see THIS). You need to be careful to batch the records so that the target API can handle the pa...

  • 0 kudos
george_ognyanov
by New Contributor III
  • 1342 Views
  • 1 replies
  • 1 kudos

Orchestrate jobs using a parameter set in a notebook

I am trying to orchestrate my Databricks Workflows tasks using a parameter I would set in a notebook.Given the workflow below I am trying to set a parameter in the Cinderella task which is a python notebook. Once set I would like to use this paramete...

george_ognyanov_0-1701258224857.png george_ognyanov_0-1701259276034.png
  • 1342 Views
  • 1 replies
  • 1 kudos
Latest Reply
Panda
Valued Contributor
  • 1 kudos

Here's how we can proceed, follow the instructions below:In your previous task, depending on whether you're using Python or Scala, set the task value like this:dbutils.jobs.taskValues.set("check_value", "2")In your if-else task, you must reference th...

  • 1 kudos
tyas
by New Contributor II
  • 2589 Views
  • 1 replies
  • 1 kudos

Defining Keys

Hello,I have a DataFrame in a Databricks notebook that I've already read and transformed using PySpark-Python. I want to create a table with defined keys (primary and foreign). What is the best method to do this:Create a table and directly define key...

  • 2589 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Remember that keys are for information purposes (they don't validate data integrity). They are used for information in a few places (Feature tables, online tables, PowerBi modelling). The best is to define them in CREATE TABLE syntax, for example:CRE...

  • 1 kudos
kickbuttowski
by New Contributor II
  • 1764 Views
  • 1 replies
  • 1 kudos

Resolved! issue in loading the json files in same container with different schemas

Could you tell whether this scenario will work or not  Scenario : i have a container which is having two different json files with diff schemas which will be coming in a streaming manner , i am using an auto loader here to load the files incrementall...

  • 1764 Views
  • 1 replies
  • 1 kudos
Latest Reply
MichTalebzadeh
Valued Contributor
  • 1 kudos

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files ass...

  • 1 kudos
Sans1
by New Contributor II
  • 2145 Views
  • 2 replies
  • 1 kudos

Delta table vs dynamic views

Hi,My current design is to host the gold layer as dynamic views with masking. I will have couple of use cases that needs the views to be queried with filters.Does this provide equal performance like tables (which has data skipping based on transactio...

  • 2145 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Hi @Sans1  Have you only used masking, or you have used any row or column level access control?If it's only masking, then you should go with delta table and if it's row or column level access control then you should prefer dynamic views

  • 1 kudos
1 More Replies
colinsorensen
by New Contributor III
  • 1842 Views
  • 1 replies
  • 0 kudos

Upgrading to UC. Parent external location for path `s3://____` does not exist"...but it does?

Topic. I am trying to upgrade some external tables in our hive metastore to the unity catalog. I used the upgrade functionality in the UI, as well as its provided SQL: CREATE TABLE `unity_catalog`.`default`.`table` LIKE `hive_metastore`.`schema`.`tab...

  • 1842 Views
  • 1 replies
  • 0 kudos
Latest Reply
colinsorensen
New Contributor III
  • 0 kudos

When I have tried to edit the location to include the dbfs component (CREATE TABLE`unity_catalog`.`default`.`table`LIKE`hive_metastore`.`schema`.`table` LOCATION 'dbfs:/mnt/foobarbaz')I get a new error:"[UPGRADE_NOT_SUPPORTED.UNSUPPORTED_FILE_SCHEME]...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels